Improving POS Tagging Using Machine-Learning Techniques

نویسندگان

  • Lluís Màrquez i Villodre
  • Horacio Rodríguez
  • Josep Carmona
  • Josep Montolio
چکیده

In this paper we show how machine learning techniques for constructing and combining sev eral classi ers can be applied to improve the accuracy of an existing English POS tagger M arquez and Rodr guez Additionally the problem of data sparseness is also addressed by applying a technique of generating convex pseudo data Breiman Experimental re sults and a comparison to other state of the art taggers are reported

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving POS Tagging Using Machine { Learning

In this paper we show how machine learning techniques for constructing and combining several classiiers can be applied to improve the accuracy of an existing English POS tagger (MM arquez and Rodr guez, 1997). Additionally, the problem of data sparseness is also addressed by applying a technique of generating convex pseudo{data (Breiman, 1998). Experimental results and a comparison to other sta...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Part-of-Speech (POS) Tagging Revisited

Accurate part-of-speech (POS) tagging of natural language text data can add power to automated information retrieval and extraction. Brill's transformation-based learning (TBL) approach to automated POS tagging was introduced in 1992, combining virtues of rule-based and stochastic methods. Brill's innovative idea was to use machine learning techniques to search through all of rule space for the...

متن کامل

POS Tagger and Chunker for Tamil Language

This paper presents the Part Of Speech tagger and Chunker for Tamil using Machine learning techniques. Part Of Speech tagging and chunking are the fundamental processing steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task of identifying and segmenting the text...

متن کامل

Web-Based Bengali News Corpus for Lexicon Development and POS Tagging

Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing (NLP) applications. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. We have used a Bengali news corpus, developed from the web archive of a widely read Bengali newspaper. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999